Efficient Method for Mining Patterns from Highly Similar and Dense Database based on Prefix-Frequent-Items

نویسندگان

Meng Han

Zhihai Wang

Jidong Yuan

چکیده

In recent years, there are a great deal of efforts on sequential pattern mining, but some challenges have not been resolved, such as large search spaces and the ineffectiveness in handling highly similar, dense and long sequences. This paper mainly focuses on how to design some effective search space pruning methods to accelerate the mining process. We present a novel structure, PrefixFrequent-Items Graph (PFI-Graph), which presents the prefix frequent items of other items in sequential patterns. An efficient algorithm PFI-PrefixSpan (Prefix-FrequentItems PrefixSpan) based on PFI-Graph is proposed in this paper. It avoids redundant data scanning, and thus can effectively speed up the discovery process of new patterns. Extensive experimental results on some synthetic and real sequence datasets show that the proposed novel structure is substantially more efficient than PrefixSpan with physicalprojection and pseudo-projection, especially for dense and highly similar sequence databases.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Data sanitization in association rule mining based on impact factor

Data sanitization is a process that is used to promote the sharing of transactional databases among organizations and businesses, it alleviates concerns for individuals and organizations regarding the disclosure of sensitive patterns. It transforms the source database into a released database so that counterparts cannot discover the sensitive patterns and so data confidentiality is preserved ag...

متن کامل

High Fuzzy Utility Based Frequent Patterns Mining Approach for Mobile Web Services Sequences

Nowadays high fuzzy utility based pattern mining is an emerging topic in data mining. It refers to discover all patterns having a high utility meeting a user-specified minimum high utility threshold. It comprises extracting patterns which are highly accessed in mobile web service sequences. Different from the traditional fuzzy approach, high fuzzy utility mining considers not only counts of mob...

متن کامل

Single-pass incremental and interactive mining for weighted frequent patterns

Weighted frequent pattern (WFP) mining is more practical than frequent pattern mining because it can consider different semantic significance (weight) of the items. For this reason, WFP mining becomes an important research issue in data mining and knowledge discovery. However, existing algorithms cannot be applied for incremental and interactive WFP mining and also for stream data mining becaus...

متن کامل

Using and extending itemsets in data mining: query approximation, dense itemsets, and tiles

Frequent itemsets are one of the best known concepts in data mining, and there is active research in itemset mining algorithms. An itemset is frequent in a database if its items co-occur in sufficiently many records. This thesis addresses two questions related to frequent itemsets. The first question is raised by a method for approximating logical queries by an inclusion-exclusion sum truncated...

متن کامل

CT-ITL : Efficient Frequent Item Set Mining Using a Compressed Prefix Tree with Pattern Growth

Discovering association rules that identify relationships among sets of items is an important problem in data mining. Finding frequent item sets is computationally the most expensive step in association rule discovery and therefore it has attracted significant research attention. In this paper, we present a more efficient algorithm for mining complete sets of frequent item sets. In designing ou...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 9 شماره

صفحات -

تاریخ انتشار 2014

Efficient Method for Mining Patterns from Highly Similar and Dense Database based on Prefix-Frequent-Items

نویسندگان

چکیده

منابع مشابه

Data sanitization in association rule mining based on impact factor

High Fuzzy Utility Based Frequent Patterns Mining Approach for Mobile Web Services Sequences

Single-pass incremental and interactive mining for weighted frequent patterns

Using and extending itemsets in data mining: query approximation, dense itemsets, and tiles

CT-ITL : Efficient Frequent Item Set Mining Using a Compressed Prefix Tree with Pattern Growth

عنوان ژورنال:

اشتراک گذاری